Search Result

Select

Parallel high utility pattern mining algorithm based on cluster partition

XING Shuning, LIU Fang'ai, ZHAO Xiaohui

Journal of Computer Applications 2016, 36 (8): 2202-2206. DOI: 10.11772/j.issn.1001-9081.2016.08.2202

Abstract （492）

PDF （844KB）（349）

Save

The exiting algorithms generate a lot of utility pattern trees based on memory when mining high utility patterns in large-scale database, leading to occupying more memory spaces and losing some high utility itemsets. Using Hadoop platform, a parallel high utility pattern mining algorithm, named PUCP, based on cluster partition was proposed. Firstly, the clustering method was introduced to divide the transaction database into several sub-datasets. Secondly, sub-datasets were allocated to each node of Hadoop to construct utility pattern tree. Finally, the conditional pattern bases of the same item which generated from utility pattern trees were allocated to the same node, reducing the crossover operation times of each node. The theoretical analysis and experimental results show that, compared with the mainstream serial high utility pattern mining algorithm named UP-Growth (Utility Pattern Growth) and parallel high utility pattern mining algorithm named HUI-Growth (Parallel mining High Utility Itemsets by pattern-Growth), the mining efficiency of PUCP is increased by 61.2% and 16.6% respectively without affecting the reliability of the mining results; and the memory pressure of large data mining can be effectively relieved by using Hadoop platform.

Reference | Related Articles | Metrics

Select

Improved algorithm for mining collaborative frequent itemsets in multiple data streams

WANG Xin, LIU Fang'ai

Journal of Computer Applications 2016, 36 (7): 1988-1992. DOI: 10.11772/j.issn.1001-9081.2016.07.1988

Abstract （438）

PDF （769KB）（396）

Save

In view of low memory usage rate and inefficient discovery for mining frequent itemsets in multiple data streams, an improved Mining Collaborative frequent itemsets in Multiple Data Stream (MCMD-Stream) algorithm was proposed. Firstly, the window sliding based on bit-sequence technique was utilized, which was a single-pass algorithm to find the potential and frequent itemsets. Secondly, Compressed frequent Pattern Tree (CP-Tree), which is similar to Frequent Pattern Tree (FP-Tree), was constructed to store the potential and frequent itemsets. And each node in the CP-Tree could generate the logarithmic tilted window to save the counts of frequent itemsets. Finally, the valuable frequent itemsets that appeared repeatedly in multiple data streams, namely collaborative frequent itemsets, were got. Compared to A-Stream and H-Stream algorithms, MCMD-Stream algorithm can improve the mining efficiency of collaborative frequent itemsets in multiple data streams, and also reduce the usage of the memory space. The experimental results show that MCMD-Stream algorithm can efficiently be applied to mine the collaborative frequent itemsets in multiple data streams.

Reference | Related Articles | Metrics

Select

Optimized clustering algorithm based on density of hierarchical division

PANG Lin, LIU Fang'ai

Journal of Computer Applications 2016, 36 (6): 1634-1638. DOI: 10.11772/j.issn.1001-9081.2016.06.1634

Abstract （501）

PDF （731KB）（410）

Save

The traditional clustering algorithms cluster the dataset repeatedly, and have poor computational efficiency on large datasets. In order to solve the problem, a novel algorithm based on hierarchy partition was proposed to determine the optimal number of clusters and initial centers of clusters, named Clusters Optimization based on Density of Hierarchical Division (CODHD). Based on hierarchical division, the computational process was studied, which did not need to cluster datasets repeatedly. First of all, all statistical values of clustering features were obtained by scanning dataset. Secondly, the data partitions of different level were generated from bottom-to-up, the density of each partition data point was calculated, and the maximum density point of each partition was taken as the initial center. At the same time, the minimum distance from the center to the higher density data point was calculated, the average of products' sum of the density of the center and the minimum distance was taken as the validity index and a clustering quality curve of different hierarchical division was built incrementally. Finally, the optimal number of clusters and the initial center of clusters were estimated corresponding to the partition of extreme points of curve. The experimental results demonstrate that, compared with Clusters Optimization on Preprocessing Stage (COPS), the proposed CODHD improved clustering accuracy by 30% and clustering algorithm efficiency at least 14.24%. The proposed algorithm has strong feasibility and practicability.

Reference | Related Articles | Metrics

Select

Computing global unbalanced degree of signed networks based on culture algorithm

ZHAO Xiaohui, LIU Fang'ai

Journal of Computer Applications 2016, 36 (12): 3341-3346. DOI: 10.11772/j.issn.1001-9081.2016.12.3341

Abstract （492）

PDF （864KB）（398）

Save

Many approaches which are developed to compute structural balance degree of signed networks only focus on the balance information of local network without considering the balance of network in larger scale and even from the whole viewpoint, which can't discover unbalanced links in the network. In order to solve the problem, a method of computing global unbalanced degree of signed networks based on culture algorithm was proposed. The computation of unbalanced degree was converted to an optimization problem by using the Ising spin glass model to describe the global state of signed network. A new cultural algorithm with double evolution structures named Culture Algorithm for Signed Network Balance (CA-SNB) was presented to solve the optimization problem. Firstly, the genetic algorithm was used to optimize the population space. Secondly, the better individuals were recorded in belief space and the situation knowledges were summarized by using greedy strategy. Finally, the situation knowledge was used to guide population space evolution. The convergence rate of CA-SNB was improved on the basis of population diversity. The experimental results show that, the CA-SNB can converge to the optimal solution faster and can be more robust than genetic algorithm and matrix transformation algorithm. The proposed algorithm can compute the global unbalanced degree and discover unbalanced links at the same time.

Reference | Related Articles | Metrics

Select

Efficient mining algorithm for uncertain data in probabilistic frequent itemsets

LIU Haoran, LIU Fang'ai, LI Xu, WANG Jiwei

Journal of Computer Applications 2015, 35 (6): 1757-1761. DOI: 10.11772/j.issn.1001-9081.2015.06.1757

Abstract （477）

PDF （911KB）（463）

Save

When using the way of pattern growth to construct tree structure, the exiting algorithms for mining probabilistic frequent itemsets suffer many problems, such as generating large number of tree nodes, occupying large memory space and having low efficiency. In order to solve these problems, a Progressive Uncertain Frequent Pattern Growth algorithm named PUFP-Growth was proposed. By the way of reading data in the uncertain database tuple by tuple, the proposed algorithm constructed tree structure as compact as Frequent Pattern Tree (FP-Tree) and updated dynamic array of expected value whose header table saved the same itemsets. When all transactions were inserted into the Progressive Uncertain Frequent Pattern tree (PUFP-Tree), all the probabilistic frequent itemsets could be mined by traversing the dynamic array. The experimental results and theoretical analysis show that PUFP-Growth algorithm can find the probabilistic frequent itemsets effectively. Compared with the Uncertain Frequent pattern Growth (UF-Growth) algorithm and Compressed Uncertain Frequent-Pattern Mine (CUFP-Mine) algorithm, the proposed PUFP-Growth algorithm can improve mining efficiency of probabilistic frequent itemsets on uncertain dataset and reduce memory usage to a certain degree.

Reference | Related Articles | Metrics